A  Connectionist  Treatment  of  Grammar 
for  Generation:  Relying  on  Emer gents 

Nigel  Ward 

Computer  Science  Division 
University  of  California  at  Bericeley 


Abstract 

Parallel  treatment  of  syntactic  considerations  in  generation 
promises  quality  and  speed.  Parallelism  should  be  used  not  only 
for  simultaneous  processing  of  several  sub-parts  of  the  output,  but 
even  within  single  parts.  If  both  types  of  parallelism  are  used  with 
incremental  generation  it  becomes  unnecessary  to  build  up  and  ma¬ 
nipulate  representations  of  sentence  structure — the  syntactic  form 
of  the  output  can  be  emergent. 

FIG  is  a  structured  connectionist  generator  built  in  this  way. 
Constructions  and  their  constituents  are  represented  in  the  same 
network  which  encodes  world  knowledge  and  lexical  knowledge. 
Grammatical  output  results  from  synergy  among  many  construc¬ 
tions  simultaneously  active  at  run-time.  FIG  incorporates  new 
ways  of  handling  constituency,  word  order  and  optional  con¬ 
stituents;  and  simple  ways  to  avoid  the  problems  of  instantiation 
and  binding.  Syntactic  knowledge  is  expressed  in  a  simple,  read¬ 
able  form;  this  representation  straightforwardly  defines  parts  of  the 
network. 

1  Introduction 

Generation  research  has  not  yet  fully  identified  the  advan¬ 
tages  offered  by  parallelism  nor  the  techniques  necessary  to 
take  advantage  of  it.  This  is  especially  true  for  the  syntactic 
aspects  of  generation. 

This  paper  presents  a  way  to  exploit  parallelism  for  syn¬ 
tax  in  generation.  The  key  points  are:  Syntactic  construc¬ 
tions  are  encoded  in  the  same  knowledge  network  as  words 
and  concepts.  Many  constructions  are  active  in  parallel; 
there  is  synergy,  and  sometimes  competition.  The  syntactic 
form  of  the  output  emerges  from  interactions  among  con¬ 
structions  at  run-time — explicit  syntactic  choice  and  build¬ 
ing  up  of  representations  of  syntactic  structure  are  unneces¬ 
sary. 

To  see  that  this  approach  works  for  syntactically  non¬ 
trivial  examples,  consider  that  FIG’s  outputs  include:  "once 

^Thanks  to  Daniel  Jurafsky,  Robert  Wilensky,  Dekai  Wu,  and  Terry 
Regier.  This  research  was  sponsored  by  the  Defense  Advanced  Research 
Projects  Agency  (DoD),  monitored  by  the  Space  and  Naval  Warfare  Sys¬ 
tems  Command  under  N00039-88-C-0292,  and  the  Office  of  Naval  Re¬ 
search  under  contract  N00014-89-J-3205.  An  early  version  of  this  paper 
appears  in  the  Proceedings  of  the  12th  Cognitive  Science  Conference,  Erl- 
baum,  1990. 


upon  a  time  there  lived  an  old  man  and  an  old  woman," 
“one  day  the  old  man  went  into  the  hills  to  gather  wood," 
“a  big  peach  bobbed  down  towards  an  old  woman  from  up¬ 
stream,"  “an  old  woman  gave  a  peach  to  an  old  man," 
“John  broke  a  dish,"  “John  made  the  cake  vanish," 
and  “Mary  was  killed;"  and  when  producing  Japanese: 
“mukashi  mukashi  aru  tokoro  ni  ojiisan  to  obaasan  ga 
sunde  imashita,"  “aru  hi  ojiisan  wa  yama  e  shibakari  ni 
ildmashita,"  “kawakami  kara  ookii  momo  ga  donburiko 
donburako  to  obaasan  e  nagarete  kimashita,"  “ojiisan  wa 
meeri  ni  momo  o  agemashita,"  and  “meeri  o  koroshi- 
mashita.” 

Section  2  discusses  parallelism  in  syntax  and  presents  the 
basic  proposal.  Section  3  presents  a  framework  for  connec¬ 
tionist  generation,  and  Section  4  elaborates  the  proposal  in 
this  framework.  Sections  5  through  8  discuss  an  implemen¬ 
tation  of  these  ideas:  Section  5  presents  a  representation  for 
grammatical  knowledge.  Section  6  explains  how  the  pro¬ 
posal  accounts  for  specific  syntactic  phenomena,  Section  7 
presents  an  example  of  the  generator  in  action,  and  Section 
8  discusses  general  implementation  issues.  Section  9  sum¬ 
marizes. 

2  Parallel  Syntax 

This  section  discusses  two  types  of  parallelism  for  syn¬ 
tax,  proposes  that  a  generator  should  have  both  of  them,  and 
sketches  out  the  advantages  of  such  an  approach. 

Natural  language  generation  research  traditionally  as¬ 
sumed  that  syntactic  choices  are  made  in  a  fixed  (and  gen¬ 
erally  top-down)  order.  Yet,  for  incremental  generation  at 
least,  it  is  clear  that  a  fixed  order  of  decisions  is  not  appro¬ 
priate.  This  realization  has  led  to  generators  which  work  on 
several  parts  of  the  input  in  parallel,  simultaneously  build¬ 
ing  several  sub-trees.  Recent  work  in  this  area  includes 
(De  Smedt  1990)  and  (Finkler  &  Neumann  1989).  I  will 
refer  to  this  type  of  parallelism  as  ‘part-wise’  parallelism. 

A  second  kind  of  parallelism  involves  using  several  con¬ 
structions  to  generate  even  one  part  of  the  output  As  far 
as  I  know,  this  ‘within-part’  parallelism  has  not  been  pro¬ 
posed  in  the  generation  literature.  It  has  proven  useful  in  lin¬ 
guistics.  In  Fillmore’s  Construction  Grammar  the  syntactic 


15 


Report  Documentation  Page 

Form  Approved 

0MB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  0MB  control  number. 

1.  REPORT  DATE 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-1990  to  00-00-1990 

4.  TITLE  AND  SUBTITLE 

A  Connectionist  Treatment  of  Grammar  for  Generation:  Relying  on 
Emergents 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Computer  Science  Division, University  of  California, 

Berkeley, Berkeley, CA, 94720 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR’S  ACRONYM(S) 

11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF:  17.  LIMITATION  OF 

18.  NUMBER  19a.  NAME  OF 

a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE 

unclassified  unclassified  unclassified 

8 

Standard  Form  298  (Rev.  8-98} 

Prescribed  by  ANSI  Std  Z39-18 


structure  of  sentences  is  accounted  for  in  terms  of  ‘superim¬ 
position’  of  constructions  (Fillmore  1989b).  It  has  also  been 
used  in  psycholinguistics,  where  analysis  of  speech  errors 
suggests  that  even  normal  speech  is  the  result  of  competing 
‘plans’  (Baars  1980).  More  specifically,  (Stemberger  1985) 
suggested  that  human  speakers  can  be  modeled  as  having 
many  ‘phrase  structure  units’  being  ‘partially  activated’  si¬ 
multaneously.  That  is,  many  syntactic  alternatives  for  ex¬ 
pressing  some  piece  of  meaning  are  considered  in  parallel. 

I  propose  that  a  generator  should  exploit  both  part-wise 
and  within-part  parallelism. 

Parallel  generation  is  a  good  idea  for  several  reasons.  1. 
It  has  been  observed  that  part-wise  parallelism  is  a  good 
way  to  improve  the  speed  of  response,  especially  for  incre¬ 
mental  generation.  2.  Part-wise  parallelism  is  also  useful 
for  handling  dependencies.  It  is  not  always  the  case  that 
one  part  can  be  processed  without  consideration  of  the  way 
the  surrounding  utterance  will  turn  out  If  the  various  parts 
are  generated  in  parallel  then  knowledge  about  the  proba¬ 
ble  ouQ}ut  for  one  part  is  available  for  consideration  when 
building  another  part.  This  can  lead  to  better  quality.  3. 
Given  the  possibility  of  constraints  among  the  various  syn¬ 
tactic  choices  involved  in  building  an  utterance,  there  is  the 
possibility  that  a  ‘first  choice’  will  not  work  out  when  the 
larger  context  is  considered.  This  suggests  within-part  par¬ 
allelism,  so  that  a  generator  has  available  alternative  ways 
to  realize  some  information.  Given  this  it  can  find  a  set  of 
choices  satisfies  all  the  dependencies,  resulting  in  consis¬ 
tent  and  natural  utterance.  4.  If  a  generator  is  indeed  to 
consider  all  the  possible  dependencies  among  choices,  then 
parallelism  becomes  necessary  to  cope  with  the  amount  of 
computation  necessary.  5.  Parallelism  is  the  natural  way  to 
generate  if  the  input  is  very  complex  (Ward  1989a). 

3  The  FIG  Approach  to  Generation 

Reduced  to  bare  essentials,  a  generator’s  task  is  to  get 
from  concepts  (what  the  speaker  wants  to  express)  to  words 
(what  he  can  say).  On  this  view,  the  key  problem  in  genera¬ 
tion  is  computing  the  relevance  (pertinence)  of  a  particular 
word,  given  the  concepts  to  express.  Syntactic  and  other 
knowledge  mediates  this  computation  of  relevance. 

Accordingly  FIG  is  based  on  word  choice  —  every  other 
consideration  is  analyzed  in  terms  of  how  it  affects  word 
choice. 

FIG  is  based  on  a  large  semantic  network.  Words  are 
nodes  in  the  network,  the  activation  they  receive  represents 
evidence  for  their  relevance.  The  basic  FIG  algorithm  is: 

1.  each  node  of  the  input  is  a  source  of  activation 

2.  activation  flows  through  the  network 

3.  when  the  network  settles,  the  most  highly  activated 
word  is  selected  and  emitted 

4.  activation  levels  are  updated  to  represent  the  new  cur¬ 
rent  state 


5.  steps  2  through  4  repeat  until  all  of  the  input  has  been 
conveyed 

Thus  FIG  is  an  incremental  generator.  Its  network  must 
be  designed  so  that,  when  it  settles,  the  node  which  is  most 
highly  activated  corresponds  to  the  best  next  word.  This 
paper  discusses  only  the  network  structures  which  encode 
syntactic  knowledge. 

Elsewhere  I  argue  that  FIG  points  the  way  to  accurate 
and  flexible  word  choice  (Ward  1988),  producing  natural¬ 
sounding  output  for  machine  translation  (Ward  1989c),  and 
modeling  the  key  aspects  of  the  human  language  production 
process  (Ward  1989a). 

4  Connectionist  Syntax:  Overview 

In  FIG  constructions  and  constituents  also  are  represented 
as  nodes  in  the  knowledge  network.  Their  activation  levels 
represent  their  current  relevance.  They  interact  with  other 
nodes  by  means  of  activation  flow.  Any  number  of  construc¬ 
tions  can  be  simultaneously  active.  This  handles  part-wise 
parallelism,  competition,  and  superimposition. 

Syntactic  considerations  manifest  themselves  only 
through  their  effects  on  the  activation  levels  of  words  (di¬ 
rectly  or  indirectly).  An  utterance  is  simply  the  result  of 
successive  word  choices.  FIG  does  produce  grammatical 
sentences,  most  of  the  time,  but  their  ‘syntactic  structure’  is 
emergent,  a  side-effect  of  expressing  the  meaning.  Thus  we 
can  say  that  the  syntactic  form  of  utterances  is  emergent  in 
FIG^.  This  point  will  be  illustrated  repeatedly  in  Section  6. 

Mechanisms  developed  by  linguists  (and  often  adopted 
by  generation  researchers),  such  as  unification,  are  not  di¬ 
rected  to  the  task  of  generation  (or  parsing)  so  much  as  to 
the  goal  of  explaining  sentence  structure.  Accounting  for 
the  structure  of  sentences  may  be  a  worthwhile  goal  for  lin¬ 
guistics,  but  building  syntactic  structures  is  not  necessary 
for  language  generation,  as  subsequent  sections  will  show. 

The  most  common  metaphor  for  generation  is  that  of 
making  choices  among  alternatives.  For  example,  a  gen¬ 
erator  may  choose  among  words  for  a  concept,  among  ways 
to  syntactically  realize  a  constituent,  and  among  concepts 
to  bind  to  a  slot.  Given  this  metaphor,  organizing  choices 
becomes  the  key  problem  in  generator  design.  Attempts 
to  build  parallel  generators  while  retaining  the  notion  of 
explicit  choice  run  up  against  problems  of  sequencing  the 
choices  or  of  doing  bookkeeping  so  that  the  order  of  choices 
can  vary.  This  appears  to  be  difficult,  judging  by  the  gen¬ 
eral  paucity  of  published  outputs  in  descriptions  of  parallel 
generators.  On  the  other  hand,  relying  on  emergents  means 

^Post  hoc  examination  of  FIG  output  might  make  one  think,  for  exam¬ 
ple,  ‘this  exhibits  the  choice  of  the  existential-there  constmction.’  In  FIG 
there  is  indeed  an  inhibit  link  between  the  nodes  ex-there  and  subj-pred, 
and  so  when  generating  the  network  tends  to  reach  a  state  where  only  one  of 
these  is  highly  activated.  Hie  most  highly  activated  constmction  can  have 
a  strong  effect  on  word  choices,  which  is  why  the  appearance  of  syntactic 
choice  arises. 


16 


(defp  noun-phr 

(constituents  (np-1  obi  article  ((article  1.2))) 

(np-2  opt  ad jective ( (adjective  .28))) 

(np-3  obi  noun  ( (cnoun  .47)))  )) 

Figure  1:  Representation  of  the  English  Noun-Phrase  Construction 


(defp  go-p 

(constituents  (gp-1 
(gp-2 
(gp-3 
(gp-4 


obi  go-w  ( (go-w  .2))) 

opt  epart  (  (vparticle  .6)  (directionr  .2))) 
opt' noun  ( (prep-phr  .6)  (destinationr  .2))) 
opt  verb  ((purpose-clause  .7)  (purposer  .2)))  )) 

Figure  2:  Representation  of  the  Valence  of  "Go” 


(defp  ex-there 

(inhibit  subj-pred  passive) 

(constituents  (et-1  obi  therew  (  (therew  .5))) 

(et-2  obi  verb  ((verb  .5))) 

(et-3  obi  noun  ((noun  .3)))  )) 

Figure  3:  Representation  of  the  Existential  “There"  Construction 


there  are  no  explicit  choices  to  worry  about,  and  thus  there 
are  no  problems  of  ordering  or  bookkeeping  at  all(Ward 
1989b). 

In  FIG  all  types  of  knowledge  represented  arc  uniformly 
in  the  network,  and  interact  freely  at  run  time.  FIG  not  only 
allows  this  kind  of  interaction  among  various  considerations 
when  generating,  it  relies  on  it.  It  relies  on  synergy  among 
constructions  in  the  same  way  that  Construction  Grammar 
does.  It  relies  on  synergy  between  semantic  and  syntactic 
considerations,  as  seen  below  in  Section  6.7.  It  also  enables 
interaction  among  lexical  choices  and  syntactic  considera¬ 
tions. 

5  Knowledge  of  Syntax 

This  section  presents  FIG’s  representation  of  knowledge, 
first  presenting  it  in  a  declarative  form  then  showing  how 
that  representation  maps  into  network  structures. 

Starting  with  this  section  I  will  be  largely  describing  FIG- 
as-implemented,  as  of  May  1990.  This  is  for  the  sake  of 
concreteness.  The  theory,  however,  is  intended  to  apply 
to  parallel  generators  in  general.  Moreover,  the  syntactic 
knowledge  presented  in  this  section  is  purely  illustrative.  I 
do  not  claim  that  these  represent  the  facts  of  English,  nor 
the  best  way  to  describe  them  in  a  grammar.  In  particular, 
many  generalizations  are  not  captured.  The  examples  are 
intended  simply  to  illustrate  the  representational  tools  and 
computational  mechanisms  available  in  FIG.  Many  details 
are  left  unexplained  for  lack  of  space. 

Figure  1  shows  FIG’s  definition  of  noun-phr,  represent¬ 
ing  the  English  noun-phrase  construction.  This  construction 
has  three  constituents:  np-1,  np-2,  and  np-3.  np-1  and  np- 
3  are  obligatory,  np-2  is  optional.  Glossing  over  the  details 
for  the  moment,  the  list  at  the  end  of  each  constituent’s  defi¬ 
nition  specifies  how  to  realize  the  constituent.  For  example. 


np-1,  np-2,  and  np-3,  should  be  realized  as  an  article,  ad¬ 
jective,  and  noun,  respectively. 

Figure  2  shows  the  construction  for  the  case  frame  of  the 
word  “go."  First  comes  go-w,  for  the  word  “go,"  which  is 
obligatory.  Next  come  (optionally):  a  verb-particle  repre¬ 
senting  direction  (as  in  “go  away”  or  “go  back  home"  or 
"go  down  to  the  lake"),  a  prepositional  phrase  to  express 
the  destination,  and  a  purpose  clause. 

Figure  3  shows  the  representation  of  the  existential 
“there"  construction,  as  in  “there  was  a  poor  cobbler."  The 
‘inhibit’  field  indicates  that  this  construction  is  incompati¬ 
ble  with  the  passive  construction  and  also  with  subj-pred, 
the  construction  responsible  for  the  basic  SVO  ordering  of 
English. 

Figure  4  shows  knowledge  about  when  and  where  con¬ 
structions  are  relevant  Briefly,  constructions  are  associated 
with  words,  with  concepts,  and  with  other  constructions. 

Constructions  are  associated  with  the  meanings  they  can 
express.  For  example,  ex-there  is  listed  under  the  concept 
introductory,  representing  that  this  construction  is  appro¬ 
priate  for  introducing  some  character  into  the  story,  and 
purpose-clause  is  listed  as  a  way  to  express  the  purposer 
relation. 

Constructions  are  associated  with  words.  For  example 
go-p  is  the  ‘valence’  (case  frame)  of  go-w  and  noun-phr  is 
the  ‘maximal’  of  cnoun. 

Constructions  are  also  associated  with  other  construc¬ 
tions.  For  example,  the  fourth  constituent  of  go-p  subcat¬ 
egorizes  for  purpose-clause  (Figure  2);  and  there  are  nega¬ 
tive  associations  among  incompatible  constructions,  for  ex¬ 
ample  the  ‘inhibit’  link  between  ex-there  and  subj-pred 
(Figure  3). 

Figure  5  shows  a  fragment  of  FIG’s  network,  where  the 
numbers  on  the  links  arc  their  weights.  This  is  partially 


17 


(defw  peachw 

(smallcat  cnoun) (expresses  momoc)  (grapheme  "peach”)  (english  (consnt-initial  .5))  ) 

(defs  cnoun  (bigcat  noun  .4)  (maximals  (noun-phr  .4)))  ;  common-noun 

(defw  go-w  (cat  verb)  (expresses  ikuc)  (valence  (go-p  .2)) 

(grapheme  (inf  "go")  (past  "went")  (pastp  "gone")  (presp  "going"))  ) 

(defc  introductoryc  (properties  persistent)  (english  (ex-there  .2)  )) 

(defr  purposer  (english  (to2w  .4)  (purpose-clause  .1))  (Japanese  (ni-w  .6))) 

Figure  4:  Some  Knowledge  Related  to  Constructions 


consnt-initlal 


Figure  5:  A  Fragment  of  the  Network 


specified  by  the  knowledge  shown  in  the  previous  figures. 
The  mapping  fi’om  s-expressions  to  network  structures  is  not 
quite  trivial.  For  example,  the  link  from  noun  to  peachw 
comes  from  the  statements  that  peachw  has  ‘subcat’  cnoun 
and  that  cnoun  has  ‘bigcat’  noun.  Similarly,  the  link  from 
peachw  to  noun-phr  is  inherited  by  peachw  fi-om  the  ‘max¬ 
imals’  information  on  cnoun. 

6  Various  Syntactic  Phenomena 

6.1  Constituency 

The  links  described  above  suffice  to  handle  constituency. 
Consider  for  example  the  fact  that  common  nouns  must  be 
preceded  by  articles  in  FIG’s  subset  of  English.  Suppose 
that  peachw  is  activated,  perhaps  because  a  peachc  concept 
is  in  the  input.  Activation  flows  from  peachw  via  noun- 
phr,  np-1,  and  article  to  a-w  and  the-w. 

In  this  way  the  relevance  of  a  noun  increases  the  rele¬ 
vance  rating  of  articles.  Provided  that  other  activation  levels 
are  appropriate,  this  will  cause  some  article  to  become  the 
most  highly  activated  word,  and  thus  be  selected  and  emit¬ 
ted.  Note  that  FIG  does  not  first  choose  to  say  a  noun,  then 
decide  to  say  an  article;  rather  the  these  ‘decisions’  emerge 
as  activation  levels  settle. 

Any  node  can  be  mentioned  by  a  constituent,  thus  con¬ 
structions  can  specify:  which  semantic  elements  to  include 


(metonymies),  what  order  to  mention  things  in,  what  func¬ 
tion  words  to  choose,  and  what  inflections  to  use. 

6.2  Subcategorization 

Consider  the  problem  of  specifying  where  a  given  con¬ 
cept  should  appear  and  what  syntactic  form  it  should  take. 
In  FIG  this  is  handled  by  simultaneously  activating  a  con¬ 
cept  node  and  a  syntactic  construction  or  category  node.  For 
example,  the  third  constituent  of  go-p  specifies  that  ‘the  di¬ 
rection  of  the  going’  be  expressed  as  a  ‘verbal  particle.’  Ac¬ 
tivation  will  thus  flow  to  an  appropriate  word  node,  such  as 
downw,  both  via  the  concept  filling  the  directionr  slot  and 
via  the  syntactic  category  vparticle.  Thanks  to  this  sort  of 
activation  flow  FIG  tends  to  select  and  emit  an  appropriate 
word  in  an  appropriate  form  (Ward  1988).  Government,  for 
example,  the  way  that  some  verbs  govern  case  markers,  is 
handled  in  the  same  way. 

6.3  Word  Order 

In  an  incremental  connectionist  generator,  at  each  time 
the  activation  level  of  a  word  must  represent  its  current  rele¬ 
vance.  In  particular,  words  which  are  currently  syntactically 
appropriate  must  be  strongly  activated.  In  FIG  the  represen¬ 
tation  of  the  current  syntactic  state  is  distributed  across  the 
constructions.  There  is  no  central  process  which  plans  or 
manipulates  word  order;  each  construction  simply  operates 


18 


independently.  More  highly  activated  constractions  send 
out  more  activation,  and  so  have  a  greater  effect.  But  in  the 
end,  FIG  just  follows  the  simple  rule,  ‘select  and  emit  the 
most  highly  activated  word.’  Thus  word  order  is  emergent. 

In  FIG  the  current  syntactic  state  is  encoded  in  construc¬ 
tions’  activation  levels  and  ‘cursors.’  The  cursor  of  a  con¬ 
struction  points  to  the  currently  appropriate  constituent  and 
ensures  that  it  is  relatively  highly  activated.  To  be  spe¬ 
cific,  the  cursor  gives  the  location  of  a  ‘mask’  specifying  the 
weights  of  the  links  from  the  construction  to  constituents. 
The  mask  specifies  a  weight  of  1.0  for  the  constituent  un¬ 
der  the  cursor,  and  for  subsequent  constituents  a  weight  pro¬ 
portional  to  their  closeness  to  the  cursor.  (Subsequent  con¬ 
stituents  must  receive  some  activation  so  that  there  is  part- 
wise  parallelism.)  (For  unordered  constructions  the  weights 
on  all  construction-constituent  links  are  the  same.) 

For  example,  when  the  cursor  of  noun-phr  points  to  np- 
1,  articles  receive  a  large  proportion  of  the  activation  of 
noun-phr.  Thus,  an  article  is  likely  to  be  the  most  highly 
activated  word  and  therefore  selected  and  emitted.  After  an 
article  is  emitted  the  cursor  is  advanced  to  np-2,  and  so  on. 
Advancing  cursors  is  described  in  Section  6.5. 

In  accordance  with  the  intuition  that  a  word  is  not  truly 
appropriate  unless  it  is  both  syntactically  and  semantically 
appropriate,  the  activation  level  for  words  is  given  by  the 
product  (not  the  sum)  of  incoming  syntactic  and  seman¬ 
tic  activation,  where  ‘syntactic  activation’  is  activation  re¬ 
ceived  from  constituents  and  syntactic  categories.  The 
problem  with  simply  summing  is  that  it  results  in  the  the 
network  often  being  in  a  state  where  many  word-nodes  have 
nearly  equal  activation,  which  makes  the  behavior  is  over¬ 
sensitive  to  minor  changes  in  link  weights. 

6.4  Optional  Constituents 

When  building  a  noun-phrase  a  generator  should  emit  an 
adjective  if  semantically  appropriate,  otherwise  it  should  ig¬ 
nore  that  option  and  emit  a  noun  next.  FIG  does  this  without 
additional  mechanism. 

To  see  this,  suppose  “the"  has  been  emitted  and  the  cursor 
of  noun-phr  is  on  its  second  constituent,  np-2.  As  a  result 
adjectives  get  activation,  via  np-2,  and  so  to  a  lesser  extent 
do  nouns  via  np-3.  There  are  two  cases:  If  the  input  includes 
a  concept  linked  (indirectly  perhaps)  to  some  adjective,  that 
adjective  will  receive  activation  from  it.  In  this  case  the  ad¬ 
jective  will  receive  more  syntactic  activation  than  any  noun 
does,  and  hence  have  more  total  activation,  so  it  will  be  se¬ 
lected  next.  If  the  input  does  not  include  any  concept  linked 
to  an  adjective,  then  a  noun  will  have  more  activation  than 
any  adjective  (since  only  the  noun  receives  semantic  activa¬ 
tion  also),  and  so  a  noun  will  be  selected  next 

Most  generators  use  some  syntax-driven  procedure  to  in¬ 
spect  semantics  and  decide  explicitly  whether  or  not  to  real¬ 
ize  an  optional  constituent  In  FIG,  the  decision  to  include 
or  to  omit  an  optional  constituent  (or  adjunct)  is  emergent 


—  if  an  adjective  becomes  highly  activated  it  will  be  cho¬ 
sen,  in  the  usual  fashion,  otherwise  some  other  word,  most 
likely  a  noun,  will  be. 

6S  Updating  Constructions 

Recall  that  FIG,  after  selecting  and  emitting  a  word,  up¬ 
dates  activation  levels  to  represent  the  new  state.  There  are 
are  several  aspects  to  this. 

The  cursors  of  constructions  must  advance  as  constituents 
are  completed.  The  update  mechanism  can  ‘skip  over’  ’opt 
constituents,  since,  for  example,  if  there  are  no  adjectives, 
the  cursor  of  noun-phr  should  not  remain  stuck  forever  at 
the  second  constituent  More  than  one  construction  may  be 
updated  after  a  word  is  output,  for  example,  emitting  a  noun 
may  cause  updates  to  both  the  prep-phr  construction  and 
the  noun-phr  construction. 

Constructions  which  are  ‘guiding’  the  ouq)ut  should  be 
scored  as  more  relevant  Therefore  the  update  process 
adds  activation  to  those  constructions  whose  cursors  have 
changed  and  sets  temporary  lower  bounds  on  their  activa¬ 
tion  levels.  Thus,  even  though  FIG  does  not  make  any  syn¬ 
tactic  plans,  it  tends  to  form  a  grammatical  continuation  of 
whatever  it  has  already  output.  After  the  last  constituent  of 
a  construction  has  been  completed,  the  cursor  is  reset  and 
the  lower  bound  is  removed. 

Why  is  a  separate  update  mechanism  necessary?  Most 
generators  simply  choose  a  construction  and  ‘execute’  it 
straightforwardly.  However,  in  FIG  no  construction  is  ever 
‘in  control.’  For  example,  one  construction  may  be  strongly 
activating  a  verb,  but  activation  from  other  constructions 
may  ‘interfere,’  causing  an  adverbial,  for  example,  to  be  in¬ 
terpolated.  Therefore  constructions  need  this  kind  of  feed¬ 
back  on  what  words  have  been  output. 

6.6  No  Instantiation  or  Binding 

It  is  not  obvious  that  notions  of  instantiation,  binding,  em¬ 
bedding,  or  recursion  are  essential  for  the  description  of  nat¬ 
ural  language.  Nor  are  mechanisms  for  these  things  essen¬ 
tial  for  the  generation  task,  I  conjecture.  This  subsection 
considers  a  problem  which  is  usually  handled  with  instanti¬ 
ation  and  shows  how  it  can  be  handled  more  simply  without. 

Consider  the  problem  of  generating  utterances  with  mul¬ 
tiple  ‘copies,’  for  example,  several  noun  phrases,  or  several 
uses  of  "a" .  Note  that  FIG  as  described  so  far  would  have 
problems  with  this.  For  example  since  all  words  of  cate¬ 
gory  cnoun  have  links  to  noun-phr,  that  node  might  re¬ 
ceive  more  activation  than  appropriate,  in  cases  when  sev¬ 
eral  nouns  are  active.  This  could  result  in  over-activation  of 
articles,  and  thus  premature  output  of  “the,"  for  example. 

In  fact  FIG  uses  a  special  rule  for  activation  received 
across  inherited  links:  the  maximum  (not  the  sum)  of  these 
amounts  is  used.  For  example,  this  rule  applies  to  the  ‘max¬ 
imal’  links  from  nouns  to  noun-phr,  thus  noun-phr  effec¬ 
tively  ‘ignores’  all  but  the  most  highly  activated  noun.  (This 
was  not  shown  in  Figure  5.) 


19 


agentr 


purposer 


paste 


Figure  6:  An  Input  to  FIG 


Figure  7:  Selected  Paths  of  Activation  Flow  Just  Before  Output  of  “the" 


An  earlier  version  of  FIG  handled  this  by  actually  mak¬ 
ing  copies.  For  example,  it  would  make  a  copy  of  noun- 
phr  for  each  noun-expressible  concept,  and  bind  each  copy 
to  the  appropriate  concept,  and  to  copies  of  a-w  and  the- 
w.  This  worked  but  it  made  the  program  hard  to  extend. 
In  particular,  it  was  hard  to  choose  weights  such  that  the 
network  would  behave  properly  both  before  and  after  new 
nodes  were  instantiated  and  linked  in. 

6.7  Low-level  Coherence 

Words  must  stand  in  the  correct  relations  to  their  neigh¬ 
bors.  For  example,  a  generator  must  not  produce  “the  big 
man  went  to  the  mountain”  when  the  input  calls  for  “the 
man  went  to  the  big  mountain" .  This  is  the  problem  of  emit¬ 
ting  the  right  adjective  at  the  right  time,  or,  in  other  words, 
only  emitting  adjectives  that  stand  in  an  appropriate  relation 
to  the  head  noun. 

Most  generators  handle  this  easily  with  structure¬ 
mapping  or  pointer  following.  For  example,  a  syntax- 
directed  generator  may,  whenever  building  a  noun  phrase, 
traverse  the  ‘modified-by’  pointer  to  find  the  item  to  turn 
into  an  adjective.  FIG,  however,  eschews  structure  manip¬ 
ulation  and  pointer  following.  Like  all  connectionist  ap¬ 
proaches,  therefore,  it  is  potentially  subject  to  problems  with 
crosstalk. 

The  way  to  avoid  this  is  to  ensure  that  related  concepts  be¬ 
come  highly  activated  together.  In  the  example,  bige  should 
become  activated  together  with  mountainc,  not  together 
with  old-manc.  Using  a  more  elaborate  terminology,  this 
means  that  there  should  be  some  kind  of  ‘focus  of  attention’ 
(Chafe  1980),  which  successively  ‘lights  up’  groups  of  re¬ 
lated  nodes. 

This  condition  is  met  in  FIG,  thanks  to  the  links  among 


the  nodes  of  the  input  For  example,  if  mountaincl  is  linked 
by  a  sizer  link  to  bigcl,  then  bigcl  will  tend  to  become 
highly  activated  whenever  mountaincl  is.  Thus,  when  old- 
mancl  is  the  most  highly  activated  concept-node,  bigcl 
will  only  receive  energy  from  it  indirectly  (via  an  inverse- 
agentr  Unk,  a  locationr  link,  and  a  sizer  link)  and  thus  will 
not  be  activated  sufficiently  to  interfere  early  in  the  sen¬ 
tence. 

7  Example 

This  section  describes  how  FIG  produces  “the  old  woman 
went  to  a  stream  to  wash  clothes."  For  this  example 
the  input  is  the  set  of  nodes  go-cl,  old-womancl,  wash- 
clothescl,  streamcl,  and  paste,  linked  together  as  shown 
in  Figure  6.  (The  names  of  the  concepts  have  been  an¬ 
glicized  for  the  reader’s  convenience.)  (Boxes  are  drawn 
around  nodes  in  the  input  so  that  they  can  be  easily  identi¬ 
fied  in  subsequent  diagrams.) 

Initially  each  node  of  the  input  has  11  units  of  activation. 
After  activation  flows,  before  any  word  is  output,  the  most 
highly  activated  word  node  is  the-w,  primarily  for  the  rea¬ 
sons  shown  in  Figure  7.  Figure  8  shows  the  activation  levels 
of  selected  nodes. 

After  “the"  is  emitted  the  update  mechanism  activates 
noun-phr  and  advances  its  cursor  to  np-2.  The  most  highly 
activated  word  becomes  old-womanw,  largely  due  to  acti¬ 
vation  from  np-3. 

After  "old  woman"  is  emitted  noun-phr  is  reset  —  that 
is,  the  cursor  is  set  back  to  np-1  and  it  thereby  becomes 
ready  to  guide  production  of  another  noun  phrase.  Also, 
now  the  cursor  on  subj-pred  advances  to  sp-2.  As  a  result 
verbs,  in  particular  go-w,  become  highly  activated. 


20 


■ — PATTERNS - 

- WORDS - 

- CONCEPTS - 

15.6 

SUBJ-PRED 

29.7 

THE-W 

19.7  OLD-WOMANCl 

SP-1  sp-2 

21.0 

A-W 

15.0  IKUCl 

7.6 

CAUSATIVEP 

18.5 

OLD -WOMAN W 

14.0  KAWACl 

CP-1  cp-2  cp-3 

13.3 

STREAMW 

13.2  SENTAKUCl 

6.6 

NOUN-PHR 

10.7 

RIVERW 

11.0  PASTC 

NP-1  np-2  np-3 

10.0 

GO-W 

8.3  VOWEL-INITIAL 

1.8 

GO-P 

7.5 

WASH-CLOTHESW 

6.1  CONSNT-INITIAL 

GP-1  gp-2  gp-3 

gp-4  3 . 9 

TOIW 

5.8  TOP ICC 

1.4 

PURPOSE-CLAUSE 

3.2 

MAKEW 

- OTHER - 

PC-1  pc-2  pc-3 

2.9 

TOWARDSW 

13.4  CAUSER 

0.2 

PREP-PHR 

2.9 

INTOW 

10.4  AGENTR 

PP-1  pp-2 

2.5 

T02W 

6.9  ARTICLE 

— 

0.4 

WITHW 

4 . 5  NOUN 

Figure  8:  Activation  Levels  of  Selected  Nodes  Just  Before  Output  of  “the" 


go-p 


Figure  9:  Selected  Paths  of  Activation  How  Just  Before  Output  of  "to" 


go-w  is  selected.  Because  paste  has  more  activation  than 
presentc,  infinitivec  and  so  on,  go-w  is  inflected  and  emit¬ 
ted  as  “went"  (the  inflection  mechanism  is  not  described  in 
this  paper),  go-p ’s  cursor  advances  to  its  second  constituent, 
thus  it  activates  directional  particles,  although  there  is  no  se¬ 
mantic  input  to  any  such  word  in  this  case,  tolw  becomes 
the  most  highly  activated  word,  primarily  for  the  reasons 
shown  in  Figure  9. 

After  “to"  is  emitted,  the  cursor  of  prep-phr  is  advanced. 
The  key  path  of  activation  flow  is  now  from  the  second  con¬ 
stituent  of  prep-phr  to  noun  to  streamw  to  noun-phr  to 
article  to  a-w.  Thus  a  is  selected.  The  inflection  mecha¬ 
nism  produces  “a"  not  “an"  since  consnt-initial  is  more 
highly  activated  than  vowel-initial. 

Then  the  cursor  of  noun-phr  advances  and  “stream"  is 
emitted.  After  this  the  cursor  of  go-p  advances  to  gp-4. 
From  this  constituent  activation  flows  to  purpose-clause, 
and  in  due  course  “to"  and  “wash  clothes"  are  emitted. 

Now  that  all  the  nodes  of  the  input  are  expressed,  HG 
ends,  having  produced  “the  old  woman  went  to  a  stream  to 
wash  clothes." 

8  About  the  Implementation 

I  have  used  a  connectionist  model  because  it  is  a  good 
way  to  explore  interactivity,  parallelism,  emergents,  not  be¬ 
cause  of  fondness  for  connectionism-for-its-own-sake. 


Thus  I  have  not  attempted  to  develop  a  distributed  con¬ 
nectionist  model.  Distributed  models  do  have  various  ad¬ 
vantages,  such  as  elegant  handling  of  generalizations  and 
the  potential  for  learning.  Yet  the  current  state  of  PDF  tech¬ 
nology  does  not  seem  up  to  building  an  interactive  model  of 
a  complex  task  like  language  generation.  I  therefore  devel¬ 
oped  HG  as  a  structured  (localist)  connectionist  system. 

I  have  also  not  attempted  to  make  HG  a  ‘pure’  connec¬ 
tionist  model.  For  example,  updating  constructions  is  cur¬ 
rently  done  by  a  special  process  that  goes  in  and  changes 
activation  levels  and  moves  the  cursor.  (This  process  uses 
the  third  elements  in  the  constituent  descriptions  of  Figures 
1-3,  not  previously  discussed.)  HG  could  be  made  more 
‘pure’  by  doing  this  connectionistically,  perhaps  by  adding 
new  nodes  with  special  properties.  But  this  change  would 
not  improve  HG’s  performance,  since  there  seems  no  need 
for  the  update  process  to  interact  with  the  other  processes. 

A  connectionist  model  of  computation  allows  parallelism 
and  emergents,  but  it  certainly  does  not  require  them.  In¬ 
deed,  other  generators  built  using  structured  connection- 
ism  (Kalita  &  Shastri  1987;  Gasser  1988;  Kitano  1989; 
Stolcke  1989)  do  not  appear  to  exploit  parallelism  much,  nor 
do  they  exhibit  emergent  properties.  For  example,  Gasser’s 
CHIE  relies  heavily  on  winner-take-all  subnetworks,  which 
cuts  down  on  the  amount  of  effective  parallelism.  Also,  far 
from  exploiting  emergents,  CHIE  uses  ‘neuron  firings’  to 
model  syntactic  choices;  these  happen  sequentially  and  the 


21 


exact  order  and  timing  of  firings  seems  crucial. 

Currently  FIG  has  about  350  nodes  and  1000  links.  Be¬ 
fore  each  word  choice,  activation  flows  until  the  network 
settles  down,  with  cutoff  after  9  cycles.  This  takes  about  .2 
seconds  per  word  on  average,  simulating  parallel  activation 
flow  on  a  Symbolics  3670  (1.6  seconds  on  a  Sun  3/140). 

The  correct  operation  of  FIG  depends  on  having  correct 
link  weights.  I  have  no  theory  of  weights,  indeed  finding 
appropriate  ones  is  still  largely  an  empirical  process.  How¬ 
ever  there  are  regularities,  for  example,  all  ‘inhibit’  links 
have  weight  .7,  almost  all  links  firom  syntactic  categories 
to  their  members  have  weight  .5,  and  so  on.  Many  of  the 
weights  have  a  rationale:  for  example,  the  link  from  np-1 
to  articles  has  a  relatively  high  weight  because  articles  get 
very  little  activation  from  other  sources.  No  single  weight 
is  meaningful;  the  way  it  functions  in  context  is.  For  exam¬ 
ple,  the  exact  weight  of  the  link  from  the  first  constituent  of 
subj-pred  to  noun  is  not  crucial,  as  long  as  the  product  of 
it  and  the  weight  on  the  agentr  relation  is  appropriate. 

FIG’S  knowledge  is,  of  course,  very  limited.  Adding  new 
concepts,  words  or  constructions  is  generally  straightfor¬ 
ward;  they  can  be  encoded  by  analogy  to  similar  nodes,  and 
usually  the  same  link  weights  suffice.  Occasionally  new 
nodes  and  links  interact  with  other  knowledge  in  the  system 
in  unforeseen  ways,  causing  other  nodes  to  get  too  much 
or  too  little  activation.  In  these  cases  it  is  necessary  to  de¬ 
bug  the  network.  Sometimes  trial-and-error  experimenta¬ 
tion  is  required,  but  often  the  acceptable  range  of  weights 
can  be  determined  by  examination.  This  is  a  kind  of  back- 
propagation  by  hand;  it  could  doubtless  be  automated. 

9  Summary 

I  have  proposed  a  new  way  to  handle  syntax  for  genera¬ 
tion.  The  proposal  also  relies  heavily  on  parallelism:  part- 
wise  parallelism,  competition,  and  cooperation.  Also,  syn¬ 
tactic  considerations  are  used  in  parallel  with  lexical  and 
world  knowledge  and  there  is  pervasive  interaction  among 
them.  This  promises  improved  output  quality  without  sacri¬ 
ficing  speed,  on  parallel  hardware.  The  proposal  also  relies 
heavily  on  emergents  —  it  does  not  make  syntactic  choices 
nor  build  up  representations  of  syntactic  structure.  The  net¬ 
work  representations  of  linguistic  knowledge  affect  word 
choice  and  order  directly. 

This  work  is  not  traditional  linguistics,  artificial  intelli¬ 
gence,  or  connectionism,  but  uses  techniques  from  all  three 
fields.  I  hope  this  will  stimulate  further  work  in  empirical 
computational  linguistics,  modeling  human  language  pro¬ 
duction,  and  building  useful  parallel  generation  systems. 

References 

Baars,  Bernard  K.  (1980).  The  Competing  Plans  Hypoth¬ 
esis:  an  heuristic  viewpoint  on  the  causes  of  errors  in 
speech.  In  Hans  W.  Dechert  &  Manfred  Raupach,  edi¬ 
tors,  Temporal  Variables  in  Speech.  Mouton. 


Chafe,  Wallace  L.  (1980).  The  Deployment  of  Conscious¬ 
ness  in  the  Production  of  a  Narrative.  In  Wallace  L.  Chafe, 
editor.  The  Pear  Stories.  Ablex. 

De  Smedt,  Koenrad  J.MJ.  (1990).  Incremental  Sentence 
Generation:  a  computer  model  of  grammatical  encoding. 
Technical  Report  90-0 1 ,  Nijmegen  Institute  for  Cognition 
Research  and  Information  Technology. 

Fillmore,  Charles  (1989a).  The  Mechanisms  of  “Construc¬ 
tion  Grammar”.  In  Proceedings  of  the  Berkeley  Linguistic 
Society,  volume  15. 

Fillmore,  Charles  (1989b).  On  Grammatical  Constructions, 
course  notes,  UC  Berkeley  Linguistics  Department 

Finkler,  Wolfgang  &  Giinter  Neumann  (1989).  POPEL- 
HOW:  A  Distributed  Parallel  Model  for  Incremental  Nat¬ 
ural  Language  Production  with  Feedback.  In  Proceedings 
of  the  Eleventh  International  Joint  Conference  on  Artifi¬ 
cial  Intelligence.  Detroit 

Gasser,  Micheal  (1988).  A  Connectionist  Model  of  Sen¬ 
tence  Generation  in  a  First  and  Second  Language.  Tech¬ 
nical  Report  UCLA-AI-88-13,  Los  Angeles. 

Kalita,  Jugal  &  Lokendra  Shastri  (1987).  Generation  of 
Simple  Sentences  in  English  Using  the  Connectionist 
Model  of  Computation.  In  9th  Cognitive  Science  Con¬ 
ference.  Lawrence  Erlbaum  Associates. 

Kitano,  Hiroaki  (1989).  A  Massively  Parallel  Model  of 
Natural  Language  Generation  for  Interpreting  Telephony: 
Almost  Concurrent  Processing  of  Parsing  and  Genera¬ 
tion.  In  Proceedings  of  the  Second  European  Workshop 
on  Natural  Language  Generation. 

Stemberger,  J.  P.  (1985).  An  Interactive  Activation  Model 
of  Language  Production.  In  Andrew  W.  Ellis,  edi¬ 
tor,  Progress  in  the  Psychology  of  Language,  Volume  I. 
Lawrence  Erlbaum  Associates. 

Stolcke,  Andreas  (1989).  Processing  Unification-based 
Grammars  in  a  Connectionist  Framework.  In  11th  Cogni¬ 
tive  Science  Conference.  Lawrence  Erlbaum  Associates. 

Ward,  Nigel  (1988).  Issues  in  Word  Choice.  In  Proceedings 
12th  COUNG.  Budapest 

Ward,  Nigel  (1989a).  Capturing  Intuitions  about  Human 
Language  Production.  In  Proceedings,  Cognitive  Science 
Conference.  Lawrence  Erlbaum  Associates.  Ann  Arbor. 

Ward,  Nigel  (1989b).  On  the  Ordering  of  Decisions  in  Ma¬ 
chine  Translation.  In  Proceedings  of  the  Third  Annual 
Corference  of  the  Japanese  Society  for  Artificial  Intelli¬ 
gence,  Tokyo. 

Ward,  Nigel  (1989c).  Towards  Natural  Machine  Transla¬ 
tion.  In  Proceedings  of  the  EIC  Workshop  on  Artificial 
Intelligence,  Tokyo.  Institute  of  Electronics,  Information, 
and  Communication  Engineers.  Published  as  Technical 
Research  Report  AI89-30. 


22 


